zip = read.csv("/course/data/zip/zip.csv")
# head(zip)
dim(zip)
## [1] 9298 257
The dataset at hand originates from the Zip code collection available at the provided source link. It encapsulates a matrix structure with dimensions of 9298x257, translating to 9298 observations across 257 attributes.
The primary constituent of this dataset is the representation of handwritten digits, specifically designed to evaluate the performance of classification algorithms. The dataset’s structure indicates that each observation corresponds to a distinct handwritten digit. The variable ‘digit’ signifies the actual digit (ranging from 0 to 9) the observation pertains to. The subsequent 256 attributes (from p1 to p256) represent the pixel intensities of the 16x16 grayscale image of the handwritten digit. These pixel intensities range between -1 and 1, likely indicating normalization.
In Lab 7, an assorted array of machine learning methodologies, including Random Forests, K-means clustering, and Support Vector Machines, was deployed on this dataset. Lab 8 pivots the approach towards deep learning, particularly focusing on neural networks using the ‘keras’ library. The essence of this lab revolves around discerning handwritten digits. As per the guidelines, for certain tasks, the model will harness only 2 predictors, offering a granular perspective. In contrast, other tasks will deploy all 256 predictors, furnishing a comprehensive view.
The data mining problems under scrutiny in this lab encompass: 1. Understanding the influence of the number of predictors on neural network performance. 2. Tuning and refining the architecture of the neural network to optimize the classification of handwritten digits. 3. Evaluating the trained models on the test data to ascertain their robustness and generalization capabilities.
In essence, this lab endeavors to bridge the gap between raw pixel data and meaningful classification using the potency of neural networks, all while juxtaposing the impact of the number of predictors used.
# Transform data for image generation:
image_data = (1 - as.matrix(zip[,-1]))
dim(image_data) = c(nrow(zip), 16, 16)
image_data = aperm(image_data, c(1,3,2))
# Extract labels (digit values) for the images
digit_labels = as.numeric(zip[,1])
# Create an empty matrix to hold the combined image
full_image = matrix(0, nrow = 160, ncol = 160) # Initialize with 0 (white)
# Loop to populate the combined image matrix
for(digit in 0:9) {
random_samples = sample(which(digit_labels == digit), 10)
for(index in 1:10) {
sample = random_samples[index]
row_start = (digit * 16) + 1
col_start = (index - 1) * 16 + 1
full_image[row_start:(row_start+15), col_start:(col_start+15)] = image_data[sample,,]
}
}
# Rotate the image 90 degrees clockwise around its center
rotated_image = t(apply(full_image, 2, rev))
# Display the rotated image
par(mar = c(0,0,0,0))
image(rotated_image, axes = FALSE, col = c("white", "black"))
Train a neural network by minimising the cross-entropy objective function. The network has one hidden layer with 3 units.
Compute its training and test errors.
library(neuralnet)
library(NeuralNetTools)
library(keras)
zip <- subset(zip, digit == 4 | digit == 9)
select_data <- zip[, c("digit","p9", "p24")]
# Assuming that '4' will be 0 and '9' will be 1
select_data$digit <- ifelse(select_data$digit == 4, 0, 1)
# Splitting the dataset into training and test data
train_data <- select_data[1:1000,]
test_data <- select_data[1001:nrow(select_data),]
# Preprocessing data for Keras
xmat = as.matrix(train_data[, c("p9", "p24")])
y = train_data$digit
ymat = to_categorical(y, 2)
x2mat = as.matrix(test_data[, c("p9", "p24")])
y2 = test_data$digit
# Building the neural network model with one hidden layer of 3 units
model3 =
keras_model_sequential() %>%
layer_dense(units=3, activation="relu", input_shape=c(2)) %>%
layer_dense(units=2, activation="softmax")
model3 %>% compile(loss="categorical_crossentropy",
optimizer=optimizer_rmsprop(),
metrics=c("accuracy"))
# Training the model
model3 %>% fit(xmat, ymat, epochs=200, batch_size=32, validation_split=0, verbose=0)
# Predictions
yhat = model3 %>% predict(xmat) %>% k_argmax() %>% as.integer()
## 32/32 - 0s - 136ms/epoch - 4ms/step
yhat2 = model3 %>% predict(x2mat) %>% k_argmax() %>% as.integer()
## 22/22 - 0s - 44ms/epoch - 2ms/step
# Calculating training and test errors
train_error = mean(yhat != y)
test_error = mean(yhat2 != y2)
# Print errors
cat("Training Error:", train_error, "\n")
## Training Error: 0.066
cat("Test Error:", test_error, "\n")
## Test Error: 0.07429421
Compute its training and test errors.
# Convert data frames to matrices
train_matrix <- as.matrix(train_data[, c("p9", "p24")])
test_matrix <- as.matrix(test_data[, c("p9", "p24")])
# Define the model using Keras R interface
model4 =
keras_model_sequential() %>%
layer_dense(units=2, activation="relu", input_shape=c(2)) %>%
layer_dense(units=3, activation="relu") %>%
layer_dense(units=2, activation="softmax")
# Compile the model
model4 %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(),
metrics = c("accuracy")
)
# Train the model
model4 %>% fit(train_matrix, to_categorical(train_data$digit, 2), epochs=200, batch_size=32, validation_split=0, verbose=0)
# Predictions
yhat_train = model4 %>% predict(train_matrix) %>% k_argmax() %>% as.integer()
## 32/32 - 0s - 88ms/epoch - 3ms/step
yhat_test = model4 %>% predict(test_matrix) %>% k_argmax() %>% as.integer()
## 22/22 - 0s - 50ms/epoch - 2ms/step
# Compute errors
train_error = mean(yhat_train != train_data$digit)
test_error = mean(yhat_test != test_data$digit)
# Print errors
cat("Training Error:", train_error, "\n")
## Training Error: 0.067
cat("Test Error:", test_error, "\n")
## Test Error: 0.07280832
plot_decision_boundary <- function(model, data=xmat, class=train_data$digit, add=FALSE, col=4) {
if(!add) plot(data[,1], data[,2], col=class+2, xlab="p9", ylab="p24")
x = seq(min(data[,1]), max(data[,1]), len=101)
y = seq(min(data[,2]), max(data[,2]), len=101)
f = function(d, m=model) {
d = as.matrix(d)
(m %>% predict(d))[,1]
}
z = matrix(f(expand.grid(x, y)), nrow=101)
contour(x, y, z, levels=0.5, lwd=3, lty=1, drawlabels=FALSE, add=TRUE, col=col)
}
par(mfrow=c(1,2))
plot_decision_boundary(model3, col=4)
## 319/319 - 0s - 341ms/epoch - 1ms/step
title("Task 3 Decision Boundary")
plot_decision_boundary(model4, col=5)
## 319/319 - 0s - 351ms/epoch - 1ms/step
title("Task 4 Decision Boundary")
summary(model3)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_1 (Dense) (None, 3) 9
## dense (Dense) (None, 2) 8
## ================================================================================
## Total params: 17
## Trainable params: 17
## Non-trainable params: 0
## ________________________________________________________________________________
Number of parameters: 9. We know that there are two input features. Number of weights = number of input features x number of nodes in the layer = 2 x 3 = 6. Number of bias = number of nodes in the layer = 3. Total number of parameters = weight + bias = 6 + 3 = 9. Thus, (2 inputs * 3 units + 3 biases) = 9 parameters, consistent with the output of summary.
Number of parameters: 8. The input for this layer comes from the previous hidden layer and has 3 nodes. Number of weights = number of input features x number of nodes in the layer = 3 x 2 = 6. Number of bias = number of nodes in the layer = 2. Total number of parameters = weight + bias = 6 + 2 = 8. Thus, (3 inputs * 2 units + 2 biases) = 8 parameters, consistent with the output of summary.
Conclusion:
The Total number of parameters for the neural network is 17, which is consistent with the Total params: 17 output by summary. The summary function gives us detailed information about the network, including the output shape and number of parameters for each layer, which helps us verify the structure and number of parameters for the model. In Task 4, each layer of the model has calculated the number of parameters correctly, and the total number of parameters matches the expected result.
train_all_data <- zip[1:1000,]
test_all_data <- zip[1001:nrow(zip),]
# Preprocess the data for CNN
xmat_train = (as.matrix(train_all_data[,-1]) + 1) / 2 # training data
dim(xmat_train) = c(nrow(train_all_data), 16, 16, 1)
xmat_train = aperm(xmat_train, c(1,3,2,4))
xmat_test = (as.matrix(test_all_data[,-1]) + 1) / 2 # test data
dim(xmat_test) = c(nrow(test_all_data), 16, 16, 1)
xmat_test = aperm(xmat_test, c(1,3,2,4))
y_train = ifelse(train_all_data[,1] == 4, 0, 1)
y_test = ifelse(test_all_data[,1] == 4, 0, 1)
# Build the model
model =
keras_model_sequential() %>%
layer_conv_2d(filters=32, kernel_size=c(3,3), padding="same", activation="relu", input_shape=c(16,16,1)) %>%
layer_max_pooling_2d(pool_size=c(2,2)) %>%
layer_conv_2d(filters=64, kernel_size=c(3,3), padding="same", activation="relu") %>%
layer_max_pooling_2d(pool_size=c(2,2)) %>%
layer_conv_2d(filters=128, kernel_size=c(3,3), padding="same", activation="relu") %>%
layer_max_pooling_2d(pool_size=c(2,2)) %>%
layer_flatten() %>%
layer_dropout(rate=0.5) %>%
layer_dense(units=256, activation="relu") %>%
layer_dense(units=2, activation="softmax")
model %>% compile(loss="sparse_categorical_crossentropy",
optimizer=optimizer_rmsprop(),
metrics=c("accuracy"))
# Train the model
i = sample(nrow(xmat_train))
system.time(history <- model %>% fit(xmat_train[i,,,,drop=FALSE], y_train[i], epochs=200, batch_size=64, validation_split=0.3,verbose=0))
## user system elapsed
## 205.658 11.049 37.453
plot(history)
# Predictions and errors
yhat_train = model %>% predict(xmat_train) %>% k_argmax() %>% as.integer()
## 32/32 - 0s - 185ms/epoch - 6ms/step
yhat_test = model %>% predict(xmat_test) %>% k_argmax() %>% as.integer()
## 22/22 - 0s - 87ms/epoch - 4ms/step
train_error = mean(yhat_train != y_train)
test_error = mean(yhat_test != y_test)
cat("Training Error:", train_error, "\n")
## Training Error: 0.002
cat("Test Error:", test_error, "\n")
## Test Error: 0.001485884
# Prepare the training data
X = (as.matrix(train_all_data[,-1]) + 1) / 2
dim(X) = c(nrow(train_all_data), 16, 16, 1)
X = aperm(X, c(1,3,2,4))
y = as.numeric(train_all_data[,1])
ymat = to_categorical(y, 10)
# Prepare the test data
X2 = (as.matrix(test_all_data[,-1])+1) / 2
dim(X2) = c(nrow(test_all_data), 16, 16, 1)
X2 = aperm(X2, c(1,3,2,4))
y2 = as.numeric(test_all_data[,1])
# Build the model for 10 classes
model =
keras_model_sequential() %>%
layer_conv_2d(filters=32, kernel_size=c(3,3), padding="same", activation="relu", input_shape=c(16,16,1)) %>%
layer_max_pooling_2d(pool_size=c(2,2)) %>%
layer_conv_2d(filters=64, kernel_size=c(3,3), padding="same", activation="relu") %>%
layer_max_pooling_2d(pool_size=c(2,2)) %>%
layer_conv_2d(filters=128, kernel_size=c(3,3), padding="same", activation="relu") %>%
layer_max_pooling_2d(pool_size=c(2,2)) %>%
layer_flatten() %>%
layer_dropout(rate=0.5) %>%
layer_dense(units=256, activation="relu") %>%
layer_dense(units=10, activation="softmax")
# Compile the model
model %>% compile(loss="categorical_crossentropy", optimizer=optimizer_rmsprop(), metrics=c("accuracy"))
# Train the model
i = sample(nrow(X))
system.time(history <- model %>% fit(X[i,,,,drop=FALSE], ymat[i,], epochs=200, batch_size=64, validation_split=0.3,verbose=0))
## user system elapsed
## 204.693 11.627 37.195
plot(history)
# Predictions on training and test data
yhat = model %>% predict(X) %>% k_argmax() %>% as.integer()
## 32/32 - 0s - 196ms/epoch - 6ms/step
yhat2 = model %>% predict(X2) %>% k_argmax() %>% as.integer()
## 22/22 - 0s - 98ms/epoch - 4ms/step
# Calculate training and test errors
(errors = c(mean(yhat != y), mean(yhat2 != y2)))
## [1] 0.003000000 0.001485884
summary(model)
## Model: "sequential_3"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_5 (Conv2D) (None, 16, 16, 32) 320
## max_pooling2d_5 (MaxPooling2D) (None, 8, 8, 32) 0
## conv2d_4 (Conv2D) (None, 8, 8, 64) 18496
## max_pooling2d_4 (MaxPooling2D) (None, 4, 4, 64) 0
## conv2d_3 (Conv2D) (None, 4, 4, 128) 73856
## max_pooling2d_3 (MaxPooling2D) (None, 2, 2, 128) 0
## flatten_1 (Flatten) (None, 512) 0
## dropout_1 (Dropout) (None, 512) 0
## dense_8 (Dense) (None, 256) 131328
## dense_7 (Dense) (None, 10) 2570
## ================================================================================
## Total params: 226,570
## Trainable params: 226,570
## Non-trainable params: 0
## ________________________________________________________________________________
MaxPooling2D, Flatten, and Dropout layers have 0 parameters because they don’t have any weights or biases to learn; they only perform operations on the data. The total number of parameters for the model is the sum of parameters from all the layers, which is 226,570 as provided in the summary.
Over the course of the tasks, we delved into the realm of image recognition, primarily focusing on convolutional neural networks (CNNs). Starting with identifying anime characters, we transitioned to understanding the nuances of CNNs, both in theory and practice. We worked on refining R code snippets to train these networks for multi-class classification, diving deep into the mechanics of parameter computation for various layers. Throughout, the emphasis remained on accurate image classification and effective model training, showcasing the capabilities of CNNs in this domain.